Overview

Dataset statistics

Number of variables11
Number of observations936
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory258.5 KiB
Average record size in memory282.8 B

Variable types

NUM8
CAT3

Reproduction

Analysis started2020-05-10 12:12:52.836389
Analysis finished2020-05-10 12:13:27.048553
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Title has a high cardinality: 935 distinct values High cardinality
Genre has a high cardinality: 200 distinct values High cardinality
Director has a high cardinality: 607 distinct values High cardinality
Rank is highly correlated with df_indexHigh Correlation
df_index is highly correlated with RankHigh Correlation

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE
Distinct count936
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean497.18589743589746
Minimum0
Maximum999
Zeros1
Zeros (%)0.1%
Memory size7.4 KiB

Quantile statistics

Minimum0
5-th percentile52.75
Q1245.75
median495.5
Q3745.25
95-th percentile945.5
Maximum999
Range999
Interquartile range (IQR)499.5

Descriptive statistics

Standard deviation288.1005611
Coefficient of variation (CV)0.5794624558
Kurtosis-1.202928275
Mean497.1858974
Median Absolute Deviation (MAD)249.3747261
Skewness0.009336325751
Sum465366
Variance83001.93332
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 999.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
999 1 0.1%
 
326 1 0.1%
 
339 1 0.1%
 
338 1 0.1%
 
337 1 0.1%
 
336 1 0.1%
 
334 1 0.1%
 
333 1 0.1%
 
332 1 0.1%
 
331 1 0.1%
 
Other values (926) 926 98.9%
 
ValueCountFrequency (%) 
0 1 0.1%
 
1 1 0.1%
 
2 1 0.1%
 
3 1 0.1%
 
4 1 0.1%
 
ValueCountFrequency (%) 
999 1 0.1%
 
998 1 0.1%
 
997 1 0.1%
 
996 1 0.1%
 
995 1 0.1%
 

Rank
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE
Distinct count936
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean498.18589743589746
Minimum1
Maximum1000
Zeros0
Zeros (%)0.0%
Memory size7.4 KiB

Quantile statistics

Minimum1
5-th percentile53.75
Q1246.75
median496.5
Q3746.25
95-th percentile946.5
Maximum1000
Range999
Interquartile range (IQR)499.5

Descriptive statistics

Standard deviation288.1005611
Coefficient of variation (CV)0.5782993108
Kurtosis-1.202928275
Mean498.1858974
Median Absolute Deviation (MAD)249.3747261
Skewness0.009336325751
Sum466302
Variance83001.93332
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 1000.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1000 1 0.1%
 
327 1 0.1%
 
340 1 0.1%
 
339 1 0.1%
 
338 1 0.1%
 
337 1 0.1%
 
335 1 0.1%
 
334 1 0.1%
 
333 1 0.1%
 
332 1 0.1%
 
Other values (926) 926 98.9%
 
ValueCountFrequency (%) 
1 1 0.1%
 
2 1 0.1%
 
3 1 0.1%
 
4 1 0.1%
 
5 1 0.1%
 
ValueCountFrequency (%) 
1000 1 0.1%
 
999 1 0.1%
 
998 1 0.1%
 
997 1 0.1%
 
996 1 0.1%
 

Title
Categorical

HIGH CARDINALITY
UNIFORM
Distinct count935
Unique (%)99.9%
Missing0
Missing (%)0.0%
Memory size7.4 KiB
The Host
 
2
Divergent
 
1
Godzilla
 
1
Juno
 
1
Resident Evil: Retribution
 
1
Other values (930)
930
ValueCountFrequency (%) 
The Host 2 0.2%
 
Divergent 1 0.1%
 
Godzilla 1 0.1%
 
Juno 1 0.1%
 
Resident Evil: Retribution 1 0.1%
 
22 Jump Street 1 0.1%
 
A Kind of Murder 1 0.1%
 
Easy A 1 0.1%
 
Oblivion 1 0.1%
 
Step Up 2: The Streets 1 0.1%
 
Other values (925) 925 98.8%
 

Length

Max length61
Mean length14.65384615
Min length2
ValueCountFrequency (%) 
Lowercase_Letter 31 39.2%
 
Uppercase_Letter 26 32.9%
 
Decimal_Number 10 12.7%
 
Other_Punctuation 8 10.1%
 
Close_Punctuation 1 1.3%
 
Space_Separator 1 1.3%
 
Dash_Punctuation 1 1.3%
 
Open_Punctuation 1 1.3%
 
ValueCountFrequency (%) 
Latin 57 72.2%
 
Common 22 27.8%
 
ValueCountFrequency (%) 
ASCII 74 100.0%
 

Genre
Categorical

HIGH CARDINALITY
Distinct count200
Unique (%)21.4%
Missing0
Missing (%)0.0%
Memory size7.4 KiB
Action,Adventure,Sci-Fi
 
50
Drama
 
43
Comedy,Drama,Romance
 
32
Comedy
 
30
Drama,Romance
 
28
Other values (195)
753
ValueCountFrequency (%) 
Action,Adventure,Sci-Fi 50 5.3%
 
Drama 43 4.6%
 
Comedy,Drama,Romance 32 3.4%
 
Comedy 30 3.2%
 
Drama,Romance 28 3.0%
 
Animation,Adventure,Comedy 26 2.8%
 
Action,Adventure,Fantasy 26 2.8%
 
Comedy,Drama 25 2.7%
 
Comedy,Romance 25 2.7%
 
Crime,Drama,Mystery 22 2.4%
 
Other values (190) 629 67.2%
 

Length

Max length26
Mean length18.20512821
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 18 58.1%
 
Uppercase_Letter 11 35.5%
 
Dash_Punctuation 1 3.2%
 
Other_Punctuation 1 3.2%
 
ValueCountFrequency (%) 
Latin 29 93.5%
 
Common 2 6.5%
 
ValueCountFrequency (%) 
ASCII 31 100.0%
 

Director
Categorical

HIGH CARDINALITY
UNIFORM
Distinct count607
Unique (%)64.9%
Missing0
Missing (%)0.0%
Memory size7.4 KiB
Ridley Scott
 
8
M. Night Shyamalan
 
6
Michael Bay
 
6
David Yates
 
6
Paul W.S. Anderson
 
6
Other values (602)
904
ValueCountFrequency (%) 
Ridley Scott 8 0.9%
 
M. Night Shyamalan 6 0.6%
 
Michael Bay 6 0.6%
 
David Yates 6 0.6%
 
Paul W.S. Anderson 6 0.6%
 
Peter Berg 5 0.5%
 
Justin Lin 5 0.5%
 
Danny Boyle 5 0.5%
 
Denis Villeneuve 5 0.5%
 
Antoine Fuqua 5 0.5%
 
Other values (597) 879 93.9%
 

Length

Max length32
Mean length13.13782051
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 38 55.1%
 
Uppercase_Letter 27 39.1%
 
Other_Punctuation 2 2.9%
 
Dash_Punctuation 1 1.4%
 
Space_Separator 1 1.4%
 
ValueCountFrequency (%) 
Latin 65 94.2%
 
Common 4 5.8%
 
ValueCountFrequency (%) 
ASCII 56 100.0%
 

Year
Real number (ℝ≥0)

Distinct count11
Unique (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2012.7713675213674
Minimum2006
Maximum2016
Zeros0
Zeros (%)0.0%
Memory size7.4 KiB

Quantile statistics

Minimum2006
5-th percentile2007
Q12010
median2014
Q32016
95-th percentile2016
Maximum2016
Range10
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.178987268
Coefficient of variation (CV)0.001579408034
Kurtosis-0.8070367081
Mean2012.771368
Median Absolute Deviation (MAD)2.726020893
Skewness-0.6863119763
Sum1883954
Variance10.10596005
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[2006. 2012.5 2015.5 2016. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2016 268 28.6%
 
2015 123 13.1%
 
2014 95 10.1%
 
2013 86 9.2%
 
2012 62 6.6%
 
2010 59 6.3%
 
2011 58 6.2%
 
2009 49 5.2%
 
2008 49 5.2%
 
2007 46 4.9%
 
ValueCountFrequency (%) 
2006 41 4.4%
 
2007 46 4.9%
 
2008 49 5.2%
 
2009 49 5.2%
 
2010 59 6.3%
 
ValueCountFrequency (%) 
2016 268 28.6%
 
2015 123 13.1%
 
2014 95 10.1%
 
2013 86 9.2%
 
2012 62 6.6%
 

Runtime
Real number (ℝ≥0)

Distinct count92
Unique (%)9.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean113.2724358974359
Minimum66
Maximum187
Zeros0
Zeros (%)0.0%
Memory size7.4 KiB

Quantile statistics

Minimum66
5-th percentile88
Q1100
median111
Q3123
95-th percentile149
Maximum187
Range121
Interquartile range (IQR)23

Descriptive statistics

Standard deviation18.55079827
Coefficient of variation (CV)0.1637715135
Kurtosis0.6336593054
Mean113.2724359
Median Absolute Deviation (MAD)14.55930857
Skewness0.7911194262
Sum106023
Variance344.1321164
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 66. 80.5 84.5 91.5 120.5 133.5 144.5 165.5 187. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
108 29 3.1%
 
117 26 2.8%
 
100 26 2.8%
 
110 25 2.7%
 
118 25 2.7%
 
102 25 2.7%
 
106 24 2.6%
 
104 22 2.4%
 
112 22 2.4%
 
101 21 2.2%
 
Other values (82) 691 73.8%
 
ValueCountFrequency (%) 
66 1 0.1%
 
73 1 0.1%
 
80 2 0.2%
 
81 4 0.4%
 
82 1 0.1%
 
ValueCountFrequency (%) 
187 1 0.1%
 
180 2 0.2%
 
172 1 0.1%
 
170 1 0.1%
 
169 3 0.3%
 

Rating
Real number (ℝ≥0)

Distinct count55
Unique (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.729166666666667
Minimum1.9
Maximum9.0
Zeros0
Zeros (%)0.0%
Memory size7.4 KiB

Quantile statistics

Minimum1.9
5-th percentile5.175
Q16.2
median6.8
Q37.4
95-th percentile8.1
Maximum9
Range7.1
Interquartile range (IQR)1.2

Descriptive statistics

Standard deviation0.9352249579
Coefficient of variation (CV)0.1389807987
Kurtosis1.190310556
Mean6.729166667
Median Absolute Deviation (MAD)0.7355947293
Skewness-0.7045209798
Sum6298.5
Variance0.8746457219
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.9 3.8 4.55 5.15 5.65 6.15 7.35 8.15 8.55 9. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
6.7 47 5.0%
 
7 44 4.7%
 
7.1 44 4.7%
 
6.3 41 4.4%
 
7.3 40 4.3%
 
7.8 39 4.2%
 
6.6 39 4.2%
 
7.2 39 4.2%
 
6.5 37 4.0%
 
6.2 36 3.8%
 
Other values (45) 530 56.6%
 
ValueCountFrequency (%) 
1.9 1 0.1%
 
2.7 1 0.1%
 
3.2 1 0.1%
 
3.5 2 0.2%
 
3.7 1 0.1%
 
ValueCountFrequency (%) 
9 1 0.1%
 
8.8 1 0.1%
 
8.6 3 0.3%
 
8.5 6 0.6%
 
8.4 3 0.3%
 

Votes
Real number (ℝ≥0)

Distinct count933
Unique (%)99.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean175270.21688034188
Minimum61
Maximum1791916
Zeros0
Zeros (%)0.0%
Memory size7.4 KiB

Quantile statistics

Minimum61
5-th percentile1586.25
Q141593
median114918.5
Q3249538
95-th percentile530938.75
Maximum1791916
Range1791855
Interquartile range (IQR)207945

Descriptive statistics

Standard deviation190582.4207
Coefficient of variation (CV)1.08736341
Kurtosis11.27174861
Mean175270.2169
Median Absolute Deviation (MAD)137161.3218
Skewness2.493379996
Sum164052923
Variance3.632165907e+10
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[6.1000000e+01 3.8450000e+02 2.4555000e+03 9.2660000e+03 1.1611500e+05 2.2172900e+05 3.5732450e+05 5.9080900e+05 1.0466675e+06 1.7919160e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
291 2 0.2%
 
97141 2 0.2%
 
1427 2 0.2%
 
125693 1 0.1%
 
406219 1 0.1%
 
299718 1 0.1%
 
461509 1 0.1%
 
92868 1 0.1%
 
240323 1 0.1%
 
101058 1 0.1%
 
Other values (923) 923 98.6%
 
ValueCountFrequency (%) 
61 1 0.1%
 
102 1 0.1%
 
115 1 0.1%
 
164 1 0.1%
 
173 1 0.1%
 
ValueCountFrequency (%) 
1791916 1 0.1%
 
1583625 1 0.1%
 
1222645 1 0.1%
 
1047747 1 0.1%
 
1045588 1 0.1%
 

Revenue
Real number (ℝ≥0)

Distinct count789
Unique (%)84.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean80.78591346153847
Minimum0.01
Maximum936.63
Zeros0
Zeros (%)0.0%
Memory size7.4 KiB

Quantile statistics

Minimum0.01
5-th percentile0.3275
Q117.5225
median47.985
Q3102.4225
95-th percentile292.075
Maximum936.63
Range936.62
Interquartile range (IQR)84.9

Descriptive statistics

Standard deviation99.49466277
Coefficient of variation (CV)1.231584301
Kurtosis11.92658652
Mean80.78591346
Median Absolute Deviation (MAD)68.18833642
Skewness2.76636849
Sum75615.615
Variance9899.187921
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.00000e-02 5.50000e-02 3.35000e-01 4.30500e+00 4.79675e+01 ... 1.03085e+02 1.70370e+02 2.60890e+02 4.23840e+02 9.36630e+02], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
47.985 99 10.6%
 
0.03 5 0.5%
 
0.04 4 0.4%
 
0.32 4 0.4%
 
0.05 4 0.4%
 
0.02 4 0.4%
 
0.01 4 0.4%
 
0.54 3 0.3%
 
2.2 3 0.3%
 
0.15 3 0.3%
 
Other values (779) 803 85.8%
 
ValueCountFrequency (%) 
0.01 4 0.4%
 
0.02 4 0.4%
 
0.03 5 0.5%
 
0.04 4 0.4%
 
0.05 4 0.4%
 
ValueCountFrequency (%) 
936.63 1 0.1%
 
760.51 1 0.1%
 
652.18 1 0.1%
 
623.28 1 0.1%
 
533.32 1 0.1%
 

Metascore
Real number (ℝ≥0)

Distinct count84
Unique (%)9.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean58.98504273504273
Minimum11.0
Maximum100.0
Zeros0
Zeros (%)0.0%
Memory size7.4 KiB

Quantile statistics

Minimum11
5-th percentile31
Q147
median59.5
Q372
95-th percentile85
Maximum100
Range89
Interquartile range (IQR)25

Descriptive statistics

Standard deviation17.19475702
Coefficient of variation (CV)0.2915104614
Kurtosis-0.6122051468
Mean58.98504274
Median Absolute Deviation (MAD)14.21000895
Skewness-0.1238873467
Sum55210
Variance295.6596691
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 11. 21. 29.5 45.5 83.5 88.5 100. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
66 25 2.7%
 
72 25 2.7%
 
68 25 2.7%
 
64 24 2.6%
 
57 23 2.5%
 
51 22 2.4%
 
65 22 2.4%
 
48 21 2.2%
 
81 21 2.2%
 
76 21 2.2%
 
Other values (74) 707 75.5%
 
ValueCountFrequency (%) 
11 1 0.1%
 
15 1 0.1%
 
16 1 0.1%
 
18 4 0.4%
 
19 1 0.1%
 
ValueCountFrequency (%) 
100 1 0.1%
 
99 1 0.1%
 
98 1 0.1%
 
96 4 0.4%
 
95 3 0.3%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

df_indexRankTitleGenreDirectorYearRuntimeRatingVotesRevenueMetascore
001Guardians of the GalaxyAction,Adventure,Sci-FiJames Gunn20141218.1757074333.13076.0
112PrometheusAdventure,Mystery,Sci-FiRidley Scott20121247.0485820126.46065.0
223SplitHorror,ThrillerM. Night Shyamalan20161177.3157606138.12062.0
334SingAnimation,Comedy,FamilyChristophe Lourdelet20161087.260545270.32059.0
445Suicide SquadAction,Adventure,FantasyDavid Ayer20161236.2393727325.02040.0
556The Great WallAction,Adventure,FantasyYimou Zhang20161036.15603645.13042.0
667La La LandComedy,Drama,MusicDamien Chazelle20161288.3258682151.06093.0
778MindhornComedySean Foley2016896.4249047.98571.0
889The Lost City of ZAction,Adventure,BiographyJames Gray20161417.171888.01078.0
9910PassengersAdventure,Drama,RomanceMorten Tyldum20161167.0192177100.01041.0

Last rows

df_indexRankTitleGenreDirectorYearRuntimeRatingVotesRevenueMetascore
926988989MartyrsHorrorPascal Laugier2008997.16378547.98589.0
927990991Underworld: Rise of the LycansAction,Adventure,FantasyPatrick Tatopoulos2009926.612970845.80044.0
928991992Taare Zameen ParDrama,Family,MusicAamir Khan20071658.51026971.20042.0
929993994Resident Evil: AfterlifeAction,Adventure,HorrorPaul W.S. Anderson2010975.914090060.13037.0
930994995Project XComedyNima Nourizadeh2012886.716408854.72048.0
931995996Secret in Their EyesCrime,Drama,MysteryBilly Ray20151116.22758547.98545.0
932996997Hostel: Part IIHorrorEli Roth2007945.57315217.54046.0
933997998Step Up 2: The StreetsDrama,Music,RomanceJon M. Chu2008986.27069958.01050.0
934998999Search PartyAdventure,ComedyScot Armstrong2014935.6488147.98522.0
9359991000Nine LivesComedy,Family,FantasyBarry Sonnenfeld2016875.31243519.64011.0